Pricing

Pay per event

China TCM News Article Scraper

Scrapes articles from 中国中医药网 (cntcm.com.cn) — the official journal of China's National Administration of Traditional Chinese Medicine. Extracts title, author, publish date, source edition, full article body, and metadata tags (TCM topics, related herbs, integrative keywords) for each article.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

What you get

Each scraped article includes:

article_id — unique identifier from the URL
title — article headline in Chinese
category — mapped to: policy_regulation, clinical_research, herb_pharmacology, traditional_practice, news_industry, yangsheng_wellness
publish_date — as printed on the page (YYYY-MM-DD)
source — newspaper edition or column (e.g., 中国中医药报7版)
author — byline
body_text — full article body as plain text
body_html — full article body as raw HTML
tcm_topics — detected TCM keywords (针灸, 中药, 养生, etc.)
integrative_keywords — crossover wellness terms (yoga, qigong, 瑜伽, etc.)
related_herbs — Chinese herb names cited in the article
source_url — canonical article URL
scraped_at — ISO timestamp

Use cases

Clinical research: Track TCM policy announcements, trial reports, and regulatory updates from China's primary source
Pharma regulatory monitoring: Monitor CN herbal medicine policy and approval news
Academic research: Sinology, comparative medicine, integrative health studies
LLM training corpora: High-quality Chinese medical text from an authoritative institutional source
Integrative medicine: TCM-yoga/qigong/meditation crossover content discovery

Inputs

Field	Type	Description	Default
`maxItems`	Integer	Maximum articles to scrape (0 = no limit)	10
`startDate`	String	Only articles published on or after this date (YYYY-MM-DD)	—

Notes

Discovery uses the site's comprehensive /sitemap.txt (~46,000+ article URLs)
Server-rendered HTML — no JavaScript execution required
Polite crawl with modest concurrency (5 concurrent requests)
Robots.txt is respected — a small number of sensitive articles listed in robots.txt Disallow are not included in the sitemap

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

176

Xinhua News Scraper

saswave/xinhua-news-scraper

Xinhua News scraper. Collect latest news article from website english.news.cn, the english website of the older of China national news agency. Extract author name, publication date, article url and more

SASWAVE

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

WebScrap

CAST China Space Technology News Scraper

jungle_synthesizer/cast-cn-china-academy-space-technology-news-scraper

Scrapes news articles from CAST (中国空间技术研究院), China's primary satellite manufacturer. Extracts articles from channels including 本院动态, 媒体聚焦, 科技动态, and more. Each article includes title, full body text and HTML, publish date, source attribution, and channel name.

BowTiedRaccoon

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Data Pilot

News Article Scraper — Newsroom & Press Release Extractor

scrapepilot/company-ok

Scrape full article content from any newsroom, press release page, or blog. Get title, author, publish date, summary, SEO keywords, word count, and full body text. Auto-discovers article links. Checkpoint resume. $5 per 1,000 articles

Scrape Pilot

RSS & News Feed Aggregator — Multi-Source Article Scraper

joyouscam35875/rss-news-aggregator

Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.

Ken Digital

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

jack su

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

ParseForge

Google News Scraper

sourabhbgp/google-news-scraper

Scrape Google News and get the real article link, image, source, and date on every result, plus the author when Google provides it. Search, browse topics, or pull headlines. Get thousands of results with a date range, plus optional full article text and Full Coverage. Any country and language.